EVENTO
Ordinal Semantic Segmentation Applied to Medical and Odontological Images
Tipo de evento: Exame de Qualificação
Semantic segmentation consists of assigning a semantic label to each pixel of an image based on a predefined set of classes. This process plays a fundamental role in semantic image understanding, enabling object identification and the analysis of their spatial relationships. Despite the advances achieved by deep learning-based approaches, most of these methods explicitly disregard the ordinal relationships existing between classes. However, such relationships often encode relevant domain knowledge, especially in scenarios where labels follow a natural ordering, being essential for promoting more coherent, robust, and semantically consistent predictions [4, 5]. The study encompasses both parametric and non-parametric approaches, including hard constraints, implemented through architectural modifications, and soft constraints, formulated as regularization terms. Furthermore, the analysis considers both strictly unimodal [4, 5], quasi-unimodal [6] and spatial [4, 6] scenarios, allowing different levels of flexibility in the modeling of ordinal distributions. Unimodal consistency imposes that the probability distribution associated with each pixel strictly respects the ordering of the classes. In contrast, quasi-unimodal consistency relaxes this restriction by allowing small local variations in the distribution shape while still preserving the global ordinal structure. Spatial consistency, in turn, incorporates local contextual information by penalizing ordinal and semantic inconsistencies between neighboring pixels, favoring smoother, continuous, and semantically plausible transitions in the image domain. In this context, the thesis proposal includes the following research topics:1. Evaluate and adapt different methods from the literature to the context of ordinal semantic segmentation, analyzing the robustness of these approaches across different deep neural network architectures. The objective is to investigate the behavior of these methods in both simpler architectures, such as U-Net [9] and MobileNet-V2 [10], and in hybrid models and attention-based architectures, including MobileViTV2-Apple [11], SegFormer_Face [12], SegFormer_MITb0 [13], SegFormer_NVIDIA [13], and DPT_INTEL [14]. This architectural diversity enables the comparison of different feature extraction paradigms, generalization capabilities, and computational efficiencies.2. Furthermore, we extend the analysis to more modern and complex architectures through the incorporation of foundational models, such as the Segment Anything Model (SAM) [15]. The motivation for this investigation lies in the hypothesis that rich representations obtained through large-scale pretraining can be fine-tuned to implicitly capture the ordinal topology present in the data.3. Investigate the adaptation of recent language-guided ordinal learning strategies [17],[18] to the scenario of ordinal semantic segmentation [19],[20]. In particular, we intend to explore ordinal prompt learning mechanisms integrated into multimodal models in order to explicitly incorporatesemantic ordering relationships between classes through learned textual representations. The central hypothesis is that the use of rank prompts and ordinally structured linguistic descriptions can induce more discriminative and semantically aligned visual representations, thereby improving both ordinal coherence and generalization capability in scenarios with limited availability of annotated data.4. For parametric models, we propose incorporating unimodal distributions directly into the output layer of neural networks in order to explicitly enforce ordinal properties on the predictions produced by the model. In particular, we will investigate distributions such as Binomial Unimodal,Poisson Unimodal, and Gaussian Unimodal, previously explored in the context of ordinal regression and ordinal classification [3],[7].5. We propose to adapt the ORD-ACL methods [7], originally developed for ordinal regression neural networks and later incorporated into the framework presented by Cardoso et al. in [3], to the context of ordinal semantic segmentation.6. We intend to adapt an architecture also proposed in [3], called UnimodalNet, which is capable of ensuring that the neural network output preserves ordinal properties, by incorporating it into the previously mentioned architectures, including foundational models and Vision Transformer-based models.7. In the context of parametric constraints, we intend to investigate the use of Unimodal Regularization [8] proposed by Liu et al., originally designed to enforce unimodal properties in ordinal classification tasks. The central hypothesis is that integrating this regularization strategy into the context of ordinal semantic segmentation may encourage smoother, more stable, and ordinally consistent probability distributions throughout the spatial domain of the image, simultaneously contributing to greater structural robustness and semantic coherence of the segmented masks.8. We propose to investigate the adaptation of the Wasserstein Unimodal method, also presented by Cardoso et al. in [3], for ordinal semantic segmentation tasks.9. We also propose to investigate different regularization terms introduced in the literature [5,6], including CO2, which combines cross-entropy with a non-parametric regularization term called O2, as well as the Expanded Mean Squared Error (EXP_MSE) and the Quasi-unimodal Loss (QUL), originally developed for ordinal regression.10. To enforce spatial consistency, we explore functions already well established in the literature, such as CSDT and CSNP. As a novel contribution of this work, we further propose an adaptation of CSDT called Contact Surface Loss using Signed Distance Function (CSSDF), designed to enhance the topological and semantic coherence between adjacent regions. The proposed approaches will be evaluated in different scenarios involving medical and dental images using the Breast Aesthetics [21], Cervix-MobileODT [22], Mobbio [23], Teeth-UCV [24], and Teeth-ISBI [25] datasets. All these datasets naturally characterize ordinal problems and allow the evaluation of the semantic, spatial, and ordinal coherence of the proposed approaches across different medical domains. In the medical context, the motivation for ordinal semantic segmentation arises from the fact that several anatomical structures, pathological patterns, and clinical scales exhibit naturally ordered relationships. In many medical problems, classes are not independent, but rather organized according to progressive levels of severity, developmental stages, tissue depth, or continuous anatomical transitions. In the context of dental imaging, the motivation for ordinal semantic segmentation stems from the fact that teeth naturally follow structured ordinal relationships within the dental arch, including anatomical ordering patterns associated with the upper and lower jaws, neighboring tooth positioning, and sequential spatial organization. These ordinal structures provide relevant semantic and topological information that can be explicitly exploited during the segmentation process.methods can be organized as follows:Parametric Hard Constraints Binomial Unimodal Poisson Unimodal Gaussian UnimodalNon-Parametric Hard Constraints ORD-ACL UnimodalNetParametric Soft Constraints Unimodal Regularization (UR)Non-Parametric Soft Constraints CO2 QUL (Quasi-unimodal Loss) EXP_MSE (Expanded Mean Squared Error) CSNP (Contact Surface Loss using Neighbor Pixels) CSDT (Contact Surface Loss using Distance Transform) CSSDF (Contact Surface Loss using Signed Distance Function) Wasserstein UnimodalAs preliminary results, we implemented and evaluated the U-Net [9], MobileNet-V2 [10], MobileViTV2-Apple [11], SegFormer_Face [12], SegFormer_MITb0 [13], SegFormer_NVIDIA [13], and DPT_INTEL [14] architectures in order to compare the performance of the CO2, QUL, EXP_MSE, CSNP, and CSSDF regularization terms in the context of ordinal semantic segmentation. These experiments enabled the completion of items 9 and 10 of the thesis proposal, demonstrating satisfactory results in terms of ordinal coherence, spatial consistency, and segmentationquality. In addition to these architectures, we also investigated the use of the foundational model Segment Anything Model (SAM) through fine-tuning strategies combined with ordinal regularization terms. However, the obtained results indicated that, for the evaluated scenarios, the employed regularization mechanisms were not sufficient to induce consistent ordinal properties in the model predictions, thus not achieving promising performance when compared to the previously analyzed architectures.Evento OnlinePara assistir acesse: meet.google.com/juv-qhyg-tyg
Data Início: 03/08/2026 Hora: 14:00 Data Fim: 03/08/2026 Hora: 17:00
Local: LNCC - Laboratório Nacional de Computação Ciêntifica - Virtual
Aluno: Mariana Dória Prata Lima - - LNCC
Orientador: Gilson Antônio Giraldi - Laboratório Nacional de Computação Científica - LNCC Jaime Cardoso - Instituto de Engenharia de Sistemas e Computação do Porto, INESC-Porto, Portugal -
Participante Banca Examinadora: Aristófanes Corrêa Silva - Universidade Federal do Maranhão - UFMA Júlio de Castro Vargas Fernandes - Laboratório Nacional de Computação Científica - LNCC Pablo Javier Blanco - Laboratório Nacional de Computação Científica - LNCC
Suplente Banca Examinadora: Jauvane Cavalcante de Oliveira - Laboratório Nacional de Computação Científica - LNCC


